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The  research  performed  by  James  Todd  during  the  past  three  years  of  AFOSR  support 
has  examined  the  abilities  of  human  observers  to  determine  an  object's  3-dimensional  i 
form  from  various  types  of  optical  information  such  as  shading,  texture,  motion 
or  binocular  disparity,  both  individually  and  in  ccmbination.  The  results  of  this 
research  provide  strong  evidence  that  our  perceptual  representations  of  3D  metrical 
properties  are  surprisingly  inaccurate  and  imprecise,  but  that  observers  Eure  tjuite 
good  at  judging  ordinal  or  ncminal  relations  among  different  surface  regions.  We 
have  also  examined  how  these  judgjnents  are  influenced  by  ccmbinfing  different  t3^s 
of  optical  information  using  both  computer  simulations  and  direct  viewing  of  natural 
scenes.  I 
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Visual  Perception  of  3-DiniensionaI  Form  from  Different 
Types  of  Optical  Deformations 

#F49620-93-l-0116 

Final  Technical  Report:  Feb.  1, 1993  -  Aug.  31, 1996 


The  research  performed  in  my  laboratory  during  the  past  three  years  under  AFOSR 
support  has  been  designed  to  investigate  the  fundamental  mechanisms  involved  in  the  perceptual 
analysis  of  3-dimensional  form  and  motion.  Much  of  this  research  has  been  specifically 
motivated  from  a  computational  perspective.  Our  basic  research  strategy  has  been  to  identify  the 
key  assumptions  of  various  competing  models,  and  to  empirically  investigate  the  relative 
psychological  validity  of  those  assumptions  using  appropriate  psychophysical  procedures. 

There  are  several  different  properties  of  optical  structure  from  which  observers  are  able  to 
obtain  information  about  the  structure  of  the  environment.  We  have  tried  in  our  research  to 
consider  a  broad  spectrum  of  issues  involving  the  analysis  and  integration  of  these  different 
sources  of  information,  and  to  identify  any  common  mechanisms  or  strategies  that  may  be 
involved  in  multiple  domains  of  perceptual  processing.  A  brief  summary  of  these  ongoing 
programs  is  provided  below. 

The  Detection  of  Motion 

One  line  of  research  performed  in  my  laboratory  has  been  designed  to  investigate  the 
competitive  and  coorperative  interactions  involved  in  the  detection  of  coherent  motion.  For 
example,  Norman,  Norman,  Todd  &  Lindsey  (in  press)  have  examined  how  perceived  speed  in 
one  region  of  a  display  is  influenced  by  the  patterns  of  motion  in  neighboring  regions,  while 
Lindsey  and  Todd  (1996)  have  investigated  the  role  of  luminance  constraints  on  the  perceived 
coherence  of  moving  plaids. 

A  related  series  of  experiments  by  Todd  &  Norman  (1995)  and  Lindsey  &  Todd 
(submitted)  has  examined  how  processes  of  spatiotemporal  integration  can  be  used  to  filter 
spurious  correlations  (i.e.,  false  targets)  that  can  arise  from  the  motions  of  densely  textured 
patterns.  In  these  studies  we  measured  the  maximum  displacement  and  signal-to-noise 
thresholds  for  moving  patterns  that  varied  in  size,  shape,  duration  and  eccentricity,  and  contained 
variable  amounts  of  uncorrelated  noise.  The  results  indicate  that  the  human  visual  system  may 
have  multiple  mechanisms  of  noise  filtering  to  facilitate  the  detection  of  coherent  motion, 
including  averaging  over  space  and  correlations  of  features  or  motions  over  multiple  time 
intervals. 

Perceived  Structure  from  Motion 

Another  line  of  research  we  have  been  pursuing  for  several  years  involves  the  ability  of 
observers  to  determine  an  object’s  3D  structure  from  its  pattern  of  projected  motion.  There  have 
been  numerous  theoretical  analyses  to  prove  that  a  complete  euclidean  reconstruction  of  an  object 
under  orthographic  projection  requires  a  minimum  of  three  distinct  views,  but  the  empirical 
evidence  suggests  that  human  observers  may  only  be  sensitive  to  the  first  order  spatiotemporal 
relations  between  pairs  of  views.  The  theoretical  consequence  of  this  limitation  is  that  3D 
structure  from  motion  would  only  be  specified  up  to  a  1-parameter  family  of  possible 
interpretations.  In  addition,  observers  would  be  unable  to  detect  certain  types  of  noiu'igid 
distortions  of  an  object  that  are  invisible  to  2-view  analyses,  but  which  could  easily  be  detected 
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by  more  traditional  algorithms  for  computing  Euclidean  metric  structure.  In  our  most  recent 
investigations  we  have  developed  a  variety  of  paradigms  for  testing  these  predictions  (see 
Norman  &  Todd,  1993;  Perotti,  Todd,  Lappin  &  Phillips,  1995;  Perotti,  Todd  &  Norman,  1996, 
Perotti,  Todd,  Lappin  &  Phillips,  submitted),  and  the  results  confirm  that  observers'  perceptions 
are  primarily  determined  by  first  order  spatiotemporal  relations. 

A  fundamental  assumption  for  most  computational  models  of  the  perception  of  structure 
from  motion  is  that  multiple  views  of  an  identifiable  image  feature  must  all  correspond  to  the 
same  physical  point  in  3-dimensional  space.  There  are  numerous  situations  encountered  in 
natural  vision,  however,  for  which  this  assumption  is  violated.  Consider,  for  example,  how  an 
object's  motion  influences  patterns  of  image  shading,  smooth  occlusions,  cast  shadows  or 
specular  highlights.  Because  the  optical  contours  produced  by  these  phenomenon  do  not  remain 
projectively  attached  to  fixed  locations  on  an  object's  surface,  the  changes  produced  by  motion 
would  be  interpreted  by  most  existing  models  as  nonrigid  distortions  of  3D  structure. 

There  is  a  growing  amount  of  evidence  to  suggest,  however,  that  is  not  how  they  are 
interpreted  by  the  human  visual  system.  Todd  &  Norman  (1994)  have  shown  that  observers  can 
Identify  nonrigid  deformations  from  shadows  and  occlusion  contours  just  as  well  as  they  are  able 
to  detect  the  nonrigid  deformations  of  textured  surfaces  with  identifiable  features.  This  result  has 
also  been  replicated  by  Norman,  Todd  &  Phillips  (1995)  and  Koenderink,  van  Doom,  Todd, 
Norman  &  Phillips  (1996)  using  judgments  of  local  depth  and  orientation  for  smoothly  curved 
surfaces  defined  by  the  optical  deformations  of  texture,  shading,  highlights  and  occlusion 
boundaries.  We  are  currently  trying  to  model  how  these  other  types  of  optical  deformations 
could  be  perceptually  analyzed  to  determine  an  object's  3-dimensional  form. 

The  Perception  of  Metric  Structure 

A  fundamental  prerequisite  for  any  computational  analysis  of  3D  form  perception  is  to 
define  how  an  object's  stmcture  is  perceptually  represented.  Although  most  existing 
computational  models  are  designed  to  obtain  a  veridical  interpretation  of  an  object's  Euclidean 
metric  stmcture,  there  have  been  a  few  recent  departures  from  this  approach  that  have 
concentrated  instead  on  more  qualitative  aspects  of  3D  form.  Much  of  the  evidence  from  our 
laboratory  during  the  past  several  years  has  indicated  that  this  latter  approach  is  more 
representative  of  the  perceptual  processes  performed  by  actual  human  observers. 

One  technique  we  have  adopted  to  address  this  issue  involves  measuring  discrimination 
thresholds  for  various  metrical  properties  of  3D  stmcture.  For  example,  Norman,  Todd,  Perotti 
&  Tittle  (1995)  obtained  Weber  fractions  of  approximately  2%  for  length  discriminations  in  the 
frontoparallel  plane,  but  these  thresholds  increased  to  over  25%  when  lines  were  presented  in 
arbitrary  3D  orientations  as  defined  by  motion  and  stereo.  We  have  also  obtained  similarly  high 
Weber  fractions  for  discrimination  of  depth  and  orientation  intervals  on  smoothly  curved  surfaces 
defined  by  shading,  texture  motion  and  stereo  (see  Norman  &  Todd,  1995, 1996;  Reichel,  Todd 
&  Yilmaz,  1995;  Todd  &  Norman,  1995).  We  are  currently  planning  to  expand  this  program  by 
measiuing  thresholds  for  interval  and  ordinal  relations  for  a  wide  variety  of  different  surface 
properties  (e.g.,  depth,  slant,  tilt,  curvedness,  etc.)  that  could  potentially  form  the  primitive  units 
of  our  perceptual  representations  of  3D  form. 

In  addition  to  measuring  the  precision  of  observers'  judgments  using  discrimination 
paradigms,  we  have  also  performed  numerous  experiments  to  measure  any  systematic  distortions 
that  may  exist  in  the  perception  of  3D  stmcture  (see  Tittle,  Todd,  Perotti  &  Norman,  1995;  Todd, 
Tittle  &  Norman,  1995;  Koenderink,  Kappers,  Todd,  Norman  &  Phillips,  in  press;  Norman, 
Todd,  Perotti  &  Tittle,  in  press;  Todd,  Koenderink,  van  Doom  &  Kappers,  1996).  We  have 
employed  several  different  paradigms  for  this  purpose  using  both  real  objects  and  computer 
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generated  displays  as  stimuli.  The  general  pattern  of  results  in  all  of  these  studies  is  that 
perceived  intervals  in  depth  tend  to  be  systematically  expanded  or  compressed  relative  to  those  in 
the  frontoparallel  plane.  The  precise  magnitude  of  this  depth  scaling  is  influenced  by  a  variety  of 
different  factors,  which  we  have  only  just  begun  to  explore. 

One  important  factor  that  can  influence  the  pattern  of  perceptual  distortion  is  the  particular 
combination  of  optical  properties  by  which  an  object's  structure  is  perceptually  specified  (e.g., 
shading,  texture,  motion  and/or  binocular  disparity).  In  many  of  our  experiments  we  have 
systematically  manipulated  the  available  sources  of  information  by  presenting  them  in  all 
possible  factorial  combinations  (see  Norman,  Todd  &  Phillips,  1995;  Tittle,  Todd,  Perotti  & 
Norman,  1995;  Todd,  Koenderink,  van  Doom  &  Kappers,  in  press).  We  have  also  obtained 
observers'  judgments  of  perceived  stracture  for  displays  in  which  the  available  sources  of 
information  are  in  conflict  with  one  another  (see  Norman  &  Todd,  1995;  Tittle,  Perotti  & 

Phillips,  1995;  Tittle,  Perotti  &  Norman,  in  press).  The  results  of  these  studies  indicate  that 
motion  and  stereo  are  the  dominant  sources  of  optical  information,  and  that  their  relative  weights 
can  vary  significantly  depending  upon  the  specific  aspect  of  3D  structure  an  observer  is  asked  to 
judge. 

The  Perception  of  Nonmetric  Structure 

Although  the  perception  of  3D  metrical  structure  can  be  highly  distorted  and  imprecise, 
observers  are  typically  quite  good  at  judging  ordinal  or  nominal  relations.  In  one  recent 
experiment,  for  example,  we  compared  performance  for  two  different  aspects  of  perceived 
relative  depth  on  smoothly  curved  surfaces  defined  by  shading,  texture,  motion  and  stereo.  For 
depth  interval  discriminations,  we  obtained  Weber  fractions  of  approximately  25%.  For  depth 
order  discriminations,  in  contrast,  the  Weber  fractions  were  reduced  to  less  than  1%.  Such 
findings  indicate  that  observers  may  have  a  very  accurate  perceptual  representation  of  the  depth 
order  relation  between  two  points  without  necessarily  knowing  the  precise  magnitude  of  their 
separation  in  depth. 

We  have  also  performed  a  number  of  recent  experiments  on  the  identity  relation  between 
multiple  views  of  an  individual  surface  point  when  it  is  observed  from  different  orientations  (see 
Koenderink,  van  Doom,  Kappers  &  Todd,  in  press;  Phillips,  Todd,  Koenderink  &  Kappers,  in 
press).  The  paradigm  we  have  developed  for  investigating  this  property  is  conceptually  quite 
simple.  Observers  are  presented  with  a  pair  of  objects  that  are  structurally  identical,  except  that 
they  have  different  random  textures  and  are  positioned  at  different  orientations  in  depth.  A  single 
point  on  one  of  the  objects  is  highlighted  with  a  small  colored  dot,  and  the  observer's  task  is  to 
identify  the  corresponding  point  at  a  different  orientation  on  the  second  object.  For  orientation 
differences  up  to  30  degrees,  the  average  errors  within  the  object's  projected  image  are  typically 
no  larger  than  a  few  minutes  of  arc!  We  are  currently  platming  an  additional  series  of 
experiments  in  an  effort  to  reveal  how  the  view-point  invariant  identity  of  a  point  is  perceptually 
determined  from  various  aspects  of  optical  information. 
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